Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime validation in setAttribute #348

Conversation

jakemalachowski
Copy link
Contributor

@jakemalachowski jakemalachowski commented Dec 27, 2019

Fixes Issue #347

  • Added lists as an accepted attribute value data type per the OT spec
  • Added data type validation when setting an attribute on a span
    • Validates type is one of int, float, str, bool or list
    • If a value is of type list, validates all values in the list are of a homogeneous primitive data type, per the OT spec

I wasn't sure what the best way to enforce valid attribute values was. I decided to just drop invalid values since that's what was being done in the PR linked in the issue. But I was wondering if it made sense to throw an exception or maybe attempt to coerce the value into a valid type instead.

@jakemalachowski jakemalachowski requested a review from a team December 27, 2019 04:33
Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution and apologies for the late reply. Most of the team is out in an end of year break, but I'll look at this tomorrow 👍

@jakemalachowski
Copy link
Contributor Author

No problem at all. I know it's a slow time of year.

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good in general, some changes requested 👍

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/tests/trace/test_trace.py Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Dec 29, 2019

Codecov Report

Merging #348 into master will increase coverage by 0.14%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #348      +/-   ##
==========================================
+ Coverage   84.99%   85.13%   +0.14%     
==========================================
  Files          38       38              
  Lines        1859     1911      +52     
  Branches      224      225       +1     
==========================================
+ Hits         1580     1627      +47     
- Misses        214      219       +5     
  Partials       65       65
Impacted Files Coverage Δ
opentelemetry-api/src/opentelemetry/util/types.py 100% <100%> (ø) ⬆️
...emetry-sdk/src/opentelemetry/sdk/trace/__init__.py 90.94% <100%> (+0.28%) ⬆️
...ry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py 68.18% <0%> (ø) ⬆️
...ntelemetry-api/src/opentelemetry/trace/__init__.py 84.56% <0%> (+0.44%) ⬆️
...elemetry-api/src/opentelemetry/metrics/__init__.py 87.93% <0%> (+1.93%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269b006...ad58890. Read the comment docs.

@jakemalachowski
Copy link
Contributor Author

Implemented your suggested changes. I am now just waiting on your feedback with regards to where the validation logic should live. I just made it a private method for now. However, considering that there aren't any other custom private methods in the file, I am assuming it belongs elsewhere.

Thanks for your feedback so far.

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, just a minor request to use is not.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/tests/trace/test_trace.py Show resolved Hide resolved
@jakemalachowski jakemalachowski requested review from ocelotl and removed request for a team December 31, 2019 00:13
@ocelotl
Copy link
Contributor

ocelotl commented Jan 1, 2020

Looking good! Please request your Linux Foundation CLA permissions so that I can approve. 👍

@jakemalachowski
Copy link
Contributor Author

I signed it

@jakemalachowski
Copy link
Contributor Author

@ocelotl All checks are now passing. Thanks for the guidance on all this.

Copy link
Contributor

@ocelotl ocelotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few code conciseness and naming comments that I think we should resolve before merging this in.

Thanks for working on this! The tests and code overall looks good.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
@@ -208,8 +209,38 @@ def set_attribute(self, key: str, value: types.AttributeValue) -> None:
if has_ended:
logger.warning("Setting attribute on ended span.")
return

if not isinstance(value, (int, float, bool, str, list, tuple)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this int, float be consolidated into Number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I use Number rather than int, float here, I will need to change it to:

if not isinstance(value, (bool, str, list, tuple)) and not issubclass(type(value), Number):

I think just using int, float here is more concise, but I can understand the appeal of being consistent with what is done in _check_attribute_value_sequence. What do you think?

Copy link
Member

@toumorokoshi toumorokoshi Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance seems to work fine:

$ python -c "from numbers import Number; print(isinstance(1, (bool, Number)))"
True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, please consider using collections.abc.Sequence instead of list, tuple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if using Number here, also use it in the AttributeValue type for consistency (and note that this allows Decimal and Fraction numbers too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toumorokoshi I'm not sure why my tests were failing after originally making this change. Just tried this again and it works.


for element in sequence:

if not isinstance(element, (bool, str, Number)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this tuple be moved into a constant in the module? This set should be shared with the tuple on line 213 (could modify line 213 to be that list + list, tuple:

VALID_ATTRIBUTE_TYPES + (list, tuple)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this should be a constant. But do you think it might be misleading to have the constant VALID_ATTRIBUTE_TYPES without list, tuple included, since they are actually valid?

Maybe we could define VALID_ATTRIBUTE_SEQUENCE_TYPES and VALID_ATTRIBUTE_NON_SEQUENCE_TYPES, then define VALID_ATTRIBUTE_TYPES as the union of the two. What do you think?

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
if return_code is not None:
logger.warning("%s in attribute value sequence", return_code)
return

self.attributes[key] = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential edge case where AttributeValues that are lists can be mutated afterward, resulting in invalid types that exporters will run into.

I've added a followup ticket on that here: #352

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great catch. Would adding a copy of the list rather than the original list resolve this?

Copy link
Contributor

@ocelotl ocelotl Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store the copies of the sequence values in tuples. Now that you mention this, instead of accepting lists or tuples we should accept sequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that's a good solution. Would be good to make a separate PR for that.

jakemalachowski and others added 3 commits January 3, 2020 07:30
…m:jakemalachowski/opentelemetry-python into ISSUE-347/attribue-value-type-enforcement
Co-Authored-By: Yusuke Tsutsumi <tsutsumi.yusuke@gmail.com>
Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! One last reply that I think makes my suggestion possible, but let me know what you think.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, please split the change of the AttributeValue definition into a separate PR (I would "request changes" for this, but seeing as I have far too little time for OpenTelemetry-Python work recently, I don't want to hold up things).

Second, I am worried that the trade offs for this feature could be wrong. The benefit of this PR is that if the API is used wrongly, the user now gets a log message right at the point of the mistake instead of a (caught) exception later in the exporter, and the span can still be processed correctly (except for the wrong attribute).

However, for users which use the API correctly (maybe they even use mypy to check for these kinds of mistakes), we slow down setting attributes (a very common operation, probably the most-common even in the whole OpenTelemetry-Python API).

Change typing to prevent heterogeneous types in lists
@jakemalachowski
Copy link
Contributor Author

@Oberon00 in response to your performance concerns:

You bring up a valid point. Is it specifically the list value validation that you see as a performance concern or do you think the isinstance(value, (bool, str, Number, Sequence) is also a bad trade off? I think keeping the first isinstance check but removing the validation on the list values might strike a good balance, but I will defer to the opinions of people more familiar with the project here.

I'm okay with implementing this either way but it would be great to get some additional input on what's best here.

@carlosalberto
Copy link
Contributor

hey hey @Oberon00

I'm also thinking about this. I'm slightly in favor of not doing this validation, and letting exporters handle this. Of course, there's no perfect solution here, but a matter of trade offs. In any case, this is a choice that maintainers will have to take ;)

(And of course, the code can be later updated if/as needed).

@toumorokoshi
Copy link
Member

toumorokoshi commented Jan 14, 2020

@carlosalberto @Oberon00 to address performance concerns: I do agree that adding this check will lead to slightly more expensive span creation, at the gain of giving helpful feedback to the user if span attributes are not valid, and removing the concern of type correctness from the exporters.

It's very well possible that, in the future, we may need to remove of pare down this code. But I think this is a very easy check to remove after the fact, but a hard one to add. And real life use cases will probably better inform whether this will be a performance bottleneck.

from a rudimentary timeit benchmark, checking isinstance is roughly a 188ns operation:

# on an Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
$ python -m timeit "isinstance('a', (str, bool, list))"
10000000 loops, best of 3: 0.188 usec per loop

Let me know if you're strongly against it, and we can discuss further. I'll leave this PR up for another day to allow more discussion, but I'm currently inclined to merge (also since @jakemalachowski put the effort in to author it).

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only very slightly against the checks. 😃
And the argument that we can remove it later is actually a good one. For example if we decide that we want to allow exporters to support arbitrary value types, we can still do that later by removing the checks if they are in place. If we added the checks later, on the other hand, that would be a breaking change.

@@ -208,8 +209,36 @@ def set_attribute(self, key: str, value: types.AttributeValue) -> None:
if has_ended:
logger.warning("Setting attribute on ended span.")
return

if not isinstance(value, (bool, str, Number, Sequence)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this check is more lenient than the AttributeValue definition (Number vs (int, float)). Also note that it seems checking for isinstance(x, ABC-type) is more expensive:

In [5]: %timeit isinstance(3, (float, Number))
319 ns ± 6.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit isinstance(3, (float, int))
110 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting. Good optimization to keep in mind if this becomes a bottleneck in the future.

opentelemetry-sdk/src/opentelemetry/sdk/trace/__init__.py Outdated Show resolved Hide resolved
@Oberon00 Oberon00 changed the title Validate attribute value data types before adding to span Update AttributeValue type definition and add runtime validation in setAttribute Jan 15, 2020
Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a technical perspective, I think this PR is fine now (apart from nit https://github.com/open-telemetry/opentelemetry-python/pull/348/files#r367960226). 👌

@jakemalachowski jakemalachowski changed the title Update AttributeValue type definition and add runtime validation in setAttribute Add runtime validation in setAttribute Jan 18, 2020
@toumorokoshi toumorokoshi merged commit 4fca8c9 into open-telemetry:master Jan 20, 2020
@toumorokoshi
Copy link
Member

Great, thanks! Congrats on your first merged PR!

mauriciovasquezbernal added a commit to kinvolk/opentelemetry-python that referenced this pull request Mar 4, 2020
4fca8c9 ("Add runtime validation in setAttribute (open-telemetry#348)") added a robust
attribute validation using numbers.Number to validate numeric types.
Although the approach is correct, it presents some complications because
Complex, Fraction and Decimal are accepted because they are Numbers. This
presents a problem to the exporters because they will have to consider all
these different cases when converting attributes to the underlying exporter
representation.

This commit simplifies the logic by accepting only int and float as numeric
values.
mauriciovasquezbernal added a commit to kinvolk/opentelemetry-python that referenced this pull request Mar 4, 2020
4fca8c9 ("Add runtime validation in setAttribute (open-telemetry#348)") added a robust
attribute validation using numbers.Number to validate numeric types.
Although the approach is correct, it presents some complications because
Complex, Fraction and Decimal are accepted because they are Numbers. This
presents a problem to the exporters because they will have to consider all
these different cases when converting attributes to the underlying exporter
representation.

This commit simplifies the logic by accepting only int and float as numeric
values.
toumorokoshi pushed a commit that referenced this pull request Mar 10, 2020
4fca8c9 ("Add runtime validation in setAttribute (#348)") added a robust
attribute validation using numbers.Number to validate numeric types.
Although the approach is correct, it presents some complications because
Complex, Fraction and Decimal are accepted because they are Numbers. This
presents a problem to the exporters because they will have to consider all
these different cases when converting attributes to the underlying exporter
representation.

This commit simplifies the logic by accepting only int and float as numeric
values.
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
* chore: add plugin developer guide

* chore: add a link to example plugin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants